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Abstract 

Background: GroESL is a lieat-sliocl< protein ubiquitous in bacteria and eukaryotic organelles. This evolutionarily 
conserved protein is involved in the folding of a wide variety of other proteins in the cytosol, being essential to the 
cell. The folding activity proceeds through strong conformational changes nnediated by the co-chaperonin GroES 
and ATP. Functions alternative to folding have been previously described for GroEL in different bacterial groups, 
supporting enormous functional and structural plasticity for this molecule and the existence of a hidden 
combinatorial code in the protein sequence enabling such functions. Describing this plasticity can shed light on 
the functional diversity of GroEL. We hypothesize that different overlapping sets of amino acids coevolve within 
GroEL, GroES and between both these proteins. Shifts in these coevolutionary relationships may inevitably lead to 
evolution of alternative functions. 

Results: We conducted the first coevolution analyses in an extensive bacterial phylogeny, revealing complex 
networks of evolutionary dependencies between residues in GroESL. These networks differed among bacterial 
groups and involved amino acid sites with functional importance and others with previously unsuspected 
functional potential. Coevolutionary networks formed statistically independent units among bacterial groups and 
map to structurally continuous regions in the protein, suggesting their functional link. Sites involved in coevolution 
fell within narrow structural regions, supporting dynamic combinatorial functional links involving similar protein 
domains. Moreover, coevolving sites within a bacterial group mapped to regions previously identified as involved in 
folding-unrelated functions, and thus, coevolution may mediate alternative functions. 

Conclusions: Our results highlight the evolutionary plasticity of GroEL across the entire bacterial phylogeny. 
Evidence on the functional importance of coevolving sites illuminates the as yet unappreciated functional diversity 
of proteins. 



Background 

Heat-shock proteins, also known as molecular chaper- 
ones, belong to a highly conserved set of protein families 
that perform essential functions to the cell in prokary- 
otes and eukaryotes [1]. These functions include, but are 
not limited to, protein folding, assembly, and transport 
[2-9]. While the folding function of GroEL has been ex- 
tensively characterized, emerging literature uncover 
many alternative functions and structures for this 
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protein (For a recent review see [10]). Mutations in this 
molecule that are responsible for the emergence of alter- 
native functions remain uncharacterized. Therefore, the 
potential evolvability of this essential protein is largely 
unexplored. 

GroES and GroEL, also known as cpnlO and cpn60 re- 
spectively, are expressed at constitutive levels under 
physiological conditions and their expression increases 
at high temperatures, allowing the growth and survival 
of bacteria at a broad range of temperatures [11-13]. 
Both chaperonins are encoded by the operon groE and 
they form a homotetradecamer organized into two back- 
to-back oriented rings. Each of the rings comprises 
seven identical GroEL subunits, with each subunit being 
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divided into three domains: the apical, which binds un- 
folded proteins and GroES, the intermediate, which acts 
as a hinge allowing the movement of the apical domain 
as well as the transition between trans and cis conforma- 
tions needed for GroEL function, and the equatorial 
which is responsible for the ATPase and the folding ac- 
tivities that take place in the central cavity of the ringed 
complex [14-16]. 

The main function of GroEL has been considered to 
be the folding of other proteins in the cell [6,14,17-20], 
although evidence supports other folding-unrelated roles 
for GroEL, such as immune response in humans [21-23] 
or growth and biofilm formation in bacteria, among 
others [24-30]. These functions are context dependent 
and may vary from one organism to another. Alternative 
functions may emerge in proteins after the duplication 
and evolution of their encoding gene or through amino 
acid replacements that impinge on the protein structure. 
The gene groEL has undergone many duplications in 
bacteria [2], adaptive evolution [31] and functional diver- 
gence [32]. Moreover, structural evolutionary changes 
have been recently described for GroEL, according to 
which changes in the amino acid composition of its co- 
chaperonin GroES can determine GroEL functioning as 
a single instead of double ring [33]. 

The strong evolutionary sequence conservation of 
groEL and the high number of interactions it establishes 
with other proteins in the cell [13,34] contrast with 
GroEL 's functional and structural plasticity and its pro- 
pensity to persist in duplicate in some bacteria. Particu- 
larly striking is the fact that, while performing essential 
functions in the cell, GroEL presents alternative func- 
tions [10]. The trade-off between groEL's high conserva- 
tion at the sequence and functional levels and its high 
propensity to evolve novel functions remains poorly 
understood. 

Researchers have attempted to uncover GroELs multi- 
functionality through the testing of the effects of directed 
mutagenesis of GroEL amino acids under laboratory- 
controlled conditions. However, the multifunctional na- 
ture of GroEL suggests the existence of a reservoir of 
functionalities resulting from the interaction between dis- 
tinct sets of amino acids in different bacteria. Here we 
propose the hypothesis that the functional plasticity of 
GroEL is mediated by an evolutionary plasticity of poten- 
tially functional amino acids. In support of this hypothesis, 
bacteria growing under different physiological conditions 
present GroEL variants with functions alternative to fold- 
ing and which involve different sets of amino acids. The 
strong selective constraints acting on GroEL imply im- 
portant functional and structural links between amino 
acids. These links impose reciprocal selection pressures 
among amino acid sites. Therefore, changes on GroEL 
functions from one bacterial group to another should be 



reflected in strong coevolutionary signatures between 
linked amino acids whose evolvability is co-regulated by 
selection in a particular bacterial clade. 

In this study we performed an exhaustive coevolution- 
ary analysis using an extensive bacterial phylogeny to 
uncover the evolutionary, hence functional, dependen- 
cies among amino acid residues within GroES, GroEL 
and between both these proteins. The coevolutionary 
networks identified in these chaperonins from hundreds 
of bacteria reveal the complexity underlying the evolution 
of this essential protein and shed light on the functional 
importance of previously uncharacterized residues. 

Results 

Sequence data and coevolution analyses 

To perform intra-protein coevolution analyses in GroES 
and GroEL, we searched groE sequences amongst the 
major bacterial Phyla and found that Actinobacteria, 
Cyanobacteria, Bacteroidetes and Chlorobi, Firmicutes, 
Proteobacteria, and Spirochaetes comprised a number of 
groE homologs that would allow accurate inference of 
coevolution. The number of sequences ranged between 
11 and 252 for groES genes, and 12 and 278 for groEL 
genes belonging to Spirochaetes and Proteobacteria 
groups, respectively (Table 1). In spite of the differences 
in the number of sequences, the mean amino acid se- 
quence divergence was of the same order in all bacteria 
groups ranging between 0.302 and 0.403, and these di- 
vergence levels were not correlated with the number of 
sequences in the alignment. These divergence levels are 
also within the levels ensuring robust results when using 
coevolution analyses. Inter-protein coevolution analyses 
between groES and groEL were performed building pairs 
of files for each group of bacteria, both of which included 
the same bacterial strains. Accordingly, the size of the 
alignments used for the GroES-L inter-coevolution ana- 
lyses ranged between 11 in Cyanobacteria and 215 in 
Proteobacteria (Table 1). All coevolution analyses were 
performed with a phylogenetic tree built up function in 
CAPS and pairs of coevolving sites were further filtered 
through a novel bootstrap analysis (see Methods). There- 
fore, the number of sequences in the alignment, level of 
sequence divergence and new introduced filters warranted 
minimizing false positives rate and increasing accuracy of 
our results. 

Evolutionary dependencies between functional sites 
within GroES and GroEL 

To determine the magnitude of the evolutionary plasti- 
city of GroEL and GroES, we first conducted a coevolu- 
tionary analysis to determine the network of residues 
dependencies in all bacteria. We performed intra-protein 
coevolution analyses in a 519 sequences based GroES 
alignment and 505 sequences based GroEL alignment. 
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Table 1 GroES (Cpn10) and GroEL (Cpn60) sequences 
used in our analysis 



Groups 


CpnIO 


Cpn60 


Cpn10-Cpn60 


Actinobacteria 


50 


25 


18 


Aquificae 


5 


3 


- 


Bacteroidetes/Chlorobi 


29 


26 


25 


Chlamydia/Verrucomicrobia 


10 


3 


- 


Chloroflexi 


5 


4 


- 


Cyanobacteria 


29 


13 


11 


Deinococcus-Thermus 


4 


4 


- 


Dictyoglomi 


- 


1 


- 


Elusimicrobia 


1 


1 


- 


Fibrobacteres/Acidobacteria 


3 


1 


- 


Firmicutes 


110 


118 


102 


Fusobacteria 


1 




- 


Nitrospirae 


1 






Proteobacteria (a, (3, y, 6, s) 


252 


278 


215 


Proteobacteria Unclassified 


1 


1 




Spirochaetes 


11 


12 


10 


Tenericutes 


1 


8 




Thermotogae 


5 


6 




Unclassified 


1 






All groups 


519 


505 


381 



For the individual intra-group analyses we chose those bacterial groups with 
more than 10 sequences. For the overall CpnIO and Cpn60 intra-group 
analyses we took all sequences (519 and 505 respectively). 



representing the 6 major bacterial groups. We also cal- 
culated the support of each pair of coevolutionary sites 
taking into account the phylogenetic relationships using 
a non-parametric bootstrap approach (see Material and 
Methods for details). All amino acid sites numbering 
and composition are referred throughout the text to 
the numbering in the crystal structure of GroESL from 
E. coli (lAON.pdb). 

We identified a single connected network of 16 co- 
evolving amino acid sites in GroES, with Lysl3, Leu27, 
Gly29, Thr36, Arg37, Glu39, Arg47 and Lys74 establish- 
ing most of the evolutionary dependencies (Figure la). 
To determine the importance of each of the amino acid 
sites in the network (e.g., amino acids establishing most 
of the connections) we applied network centrality mea- 
sures to coevolving sites, typically used in networks 
biology: degree centrality, betweenness and closeness. 
Networks are a collection of points joined together in 
pairs by lines. In the networks jargon, points are re- 
ferred to as vertices or nodes while the links are referred 
to as edges. Centrality measures of nodes, including de- 
gree, betweenness and closeness, are typically used to de- 
termine the importance of these nodes in the network. 
Degree is the number of edges departing from a node in 



the network. A node presents high closeness when its 
shortest distances to all other nodes in the network are 
low compared to the average closeness. A node has high 
betweenness when the number of shortest paths between 
all pairs of nodes in a network that pass through it is high. 

Interestingly, Leu27 and Gly29, two amino acids 
known to be involved in the interaction between GroES 
and GroEL [35,36] are the most central in the coevolu- 
tion network (Additional file 1: Figure Sla to c). The de- 
pendency of these two essential amino acids on other 
functionally uncharacterized ones hints possible func- 
tional links between both sets of amino acid sites. In- 
deed, Lysl3, Thr36, Arg37, Gly39, Arg47 and Lys74, 
while lacking apparent functions, they form a structural 
cluster establishing important contacts among GroES 
subunits (Figure lb). Amino acid sites within each of the 
structural clusters were in close proximity to each other 
(for example, their proximal carbon atoms were less 
than 4 A distant, against an average distance of 40 A be- 
tween all pairs of amino acids). Coevolution among 
structurally proximal amino acid sites is a general pat- 
tern [37] and suggests compensatory relationships, 
hence functional or structural links, between amino 
acids [38-40]. 

In GroEL, we identified 21 coevolving amino acid resi- 
dues (Figure Ic), of which Leull6, Alal27, Serl35, 
Arg231, Lys245, Gln319, Arg350, Ala443, and Asn487 
were the most central residues to the network (Additional 
file 1: Figure Sid to If). Arg231, Val236, and Lys245 are 
involved or close to (less than 4 A distance in the struc- 
ture) sites mediating substrate and GroES binding. Other 
positions were either included or close to charged amino 
acid sites that were facing the central GroEL cavity (for 
example, Gln290, Val300, Lys311, and Arg350). Finally, 
Asn487 is located in the ATP and Mg^^ binding site, while 
other amino acid sites, such as Ala443 and Ala466, are at 
the rings interface and likely involved in protein folding 
within the GroES -L ring complex. All 21 amino acids are 
distributed into two structural groups: one in the apical 
and another in the equatorial domains (Figure Id). Re- 
markably, coevolving sites are very close to sites involved 
in protein folding, substrate and GroES binding, ATP 
binding and hydrolysis, or inter-subunits contacts, thus, 
suggesting that changes at these amino acids may have 
important functional consequences (Figure Id). 

Coevolution of GroES with GroEL 

The interaction of GroES and GroEL is essential to in- 
duce the conformational changes needed for the folding 
cycle. These conformational changes may force coadap- 
tation dynamics between GroES and GroEL. 

We performed coevolutionary analyses using the pro- 
tein sequences of GroES and GroEL from the same set of 
bacterial strains (381 sequences for GroES and GroEL). 



Ruiz-Gonzalez and Fares BMC Evolutionary Biology 2013, 13:156 
http://www.bionnedcentral.conn/1471 -21 48/1 3/1 56 



Page 4 of 1 3 



a 



GroES 



Arg47 





'^^J O Equatorial 



Intermediate 
Apical 




Asn487 



Ala443 



Ala466 



Figure 1 Coevolution analyses within GroES and GroEL. The network of coevolving amino acid sites witliin GroES is sliown using tine tliree- 
letter amino acid code (a) Sites coevolving witliin GroES were divided into two main structure clusters (b) One cluster includes two amino acid 
sites (blue spheres), which are involved in the interaction with GroEL The second cluster includes residues (yellow spheres) mapping to the inter- 
GroES subunit faces. The network of coevolution in GroEL (c) identifies amino acid sites which are involved in the interaction with GroES and 
protein substrates (blue spheres in the structure of GroEL: d) sites involved in the inter-subunit GroEL contacts and and substrate folding in the 
ring cavity (red spheres), residues with a role in ATP hydrolysis (green sphere) and those mapping to the inter-ring interfaces (black spheres). 



These sequences span all the different bacterial groups 
(Table 1), with all these groups being well represented. 
Analysis of coevolution identified a group of amino acids 
from GroES coevolving with GroEL (Figure 2a). The cen- 
trality measures of coevolving sites were also calculated 
(Additional file 2: Figure S2a to c). Coevolution did not 
affect GroES sites involved in the GroES -L interaction. 
Nonetheless, sites coevolving between both proteins had 
important functional roles and mapped to different func- 
tional domains of GroEL. For example, two of the GroEL 
sites, Ala260 and Arg268, are involved in the binding of 
substrates and overlap with sites involved in GroES bind- 
ing as well [35]. In addition, Glu461, involved in the co- 
evolution between Ala260 and Arg268, has a role in 
stabilizing inter-ring contacts [41]. Since GroES is heavily 
involved in determining the function of GroEL as a single 
or as a double ring [33], the coevolution of Glu461 from 
GroEL with GroES amino acid sites may have implications 



in the structural stability of the double ring, and thus, 
GroES-GroEL folding cycle. 

In support of the structural and functional communi- 
cation between the coevolving sites of GroES and 
GroEL, coevolving amino acids formed structural clus- 
ters within GroESL (Figure 2b). In addition to their clus- 
tering, coevolving sites were either functionally relevant 
or were close to sites with reported functional import- 
ance. Taken together, these results support the hypothesis 
that the coevolutionary relationships are the result of se- 
lective constraints on amino acid sites that are structurally 
or functionally linked in the GroES-L complex. 

Shifts of GroES-GroEL coevolutionary relationships during 
bacterial evolution 

We tested whether the coevolutionary relationships 
among amino acid sites have changed among the differ- 
ent bacterial groups, which would indicate functional 
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( "^i Interaction with substrates and GroES 
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Figure 2 Coevolution between GroES and GroEL. The network of residues involved in the evolutionary dependency between GroES and 
GroEL identifies 7 residues from GroES and 8 from GroEL (a) Structural mapping of coevolving residues reveals the functional importance of 
coevolving residues (b) residues coevolving between both proteins belong to substrate binding regions, inter-subunit and inter-ring contacts. 

V J 



changes in GroES -L. Functional shifts in GroEL have 
been previously documented and linked to events of 
GroEL gene duplication [32] and to changes in the or- 
ganismal lifestyle [10,32]. However, a precise analysis of 
the sites potentially driving GroEL functional changes in 
major bacterial groups has not been conducted before. 

We identified evolutionary dependencies between 
amino acid sites that were specific to a particular bacter- 
ial group but not to others. Previous studies have shown 
that the number of sequences in the alignment may 
undermine the accuracy of coevolution-detection methods 
[42]. To avoid such size-dependent effects, we performed 
bootstrap analyses of the coevolving pairs of sites (see ma- 
terial and methods). Amino acid sites identified as co- 
evolving presented high bootstrap values (Additional file 3: 
Figure S3 and Additional file 4: Figure S4 for the coevolu- 
tion results of GroES and GroEL, respectively). Amino 
acid sites detected in coevolution analyses between GroES 
and GroEL (Additional file 5: Figure S5) were not detected 
in intra-protein coevolution analyses, and thus, were not 
the result of indirect evolutionary dependencies. 

Amino acid sites from GroEL coevolving with sites 
from GroES were centred in the apical and equatorial 
domains (Figure 3). While this was the general pattern 
when analysing the full alignment, this distribution var- 
ied significantly between bacterial clades. Figure 3 repre- 
sents the distribution of coevolving sites in GroES and 
GroEL for each of the bacterial groups examined in this 
study. A brief inspection of the graph allows identifying 
the sharp differences in the distribution of sites in the 
different domains of GroEL. For example, in Firmicutes 
coevolving sites (yellow filled circles) concentrated 
mainly in the apical domain, in good agreement with the 
distribution of such sites when analysing the entire set 
of bacteria (red stars). Proteobacteria (purple filled cir- 
cles) presented one set of coevolving sites in the apical 



domain and another in the C-terminal equatorial do- 
main. Finally, in Actinobacteria (blue filled circles) all 
but one coevolving site were located in the C-terminal 
domain of GroEL. 

The distribution of coevolving sites in GroEL second- 
ary structures and domains also differed among bacterial 
groups. Figure 4 represents the distribution of the 
expected number and the number of coevolving sites ob- 
served in Figure 3 in the alpha helices, beta-strands and 
extended strands. The main differences in the distribution 
of coevolving sites among bacterial groups reside in the 
Beta-strands. Beta-strands were significantly enriched for 
sites under coevolution in Proteobacteria, non-enriched in 
other bacterial groups, and significantly impoverished in 
Actinobacteria. These data are in good agreement with 
the functional and structural differences in GroEL found 
between Proteobacteria and Actinobacteria [10]. 

Coevolving sites are three-dimensionally proximal in 
the structure of GroES and GroEL. For example, His7 
and Asn68 from Actinobacteria that are strongly prox- 
imal in the structure (mean Euclidean distance between 
their proximal atoms is less than 4 A) were coevolving 
with two sets of amino acids from GroEL. One set in- 
cluded Tyr478, Ala481 and Cys519, all three being very 
proximal to one another in the equatorial domain of 
GroEL, and another set comprised Cysl38 and His401, 
which were proximal in the intermediate domain. 

To determine the functional meaning of the groupings 
of coevolving sites in each bacterial clade, we performed 
two different analyses. First, we followed a previously 
published approach to define functional sectors in 
GroEL and GroES [43]. In this study, sectors are charac- 
terized by statistical independence, structural continuity, 
biochemical independence and divergence independ- 
ence. Halabi and colleagues [43] showed that statistical 
protein sectors correspond to functional sectors. We 
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Figure 3 Identifying shifts in coevolutionary 

coevolution of GroES and GroEL in the different 
across the entire bacterial phylogeny (stars). The 
in GroES is shown in the Y-axis. The continuous 
(Apical: blue, Intermediate: yellow and Equatorial 
folding-independet functions. These regions are 
potato leafroll virus; 4, insecticidal neurotoxin; 5, 
lipopolysaccharides; 8, insecticidal toxin; 10 and 



50 200 250 300 350 400 450 500 548 

GroEL 

linlcs between amino acids in the different bacterial clades. We have analysed the 
bacterial clades (colour coded circles) and compared involved residues with those identified 
distribution of the coevolving residues along GroEL is shown in the X-axis, while this distribution 
bar at the very bottom of the figure represents the three different major domains of GroEL 
I: red). On top of the continuous bar we have also identified regions reported to be involved in 
color-coded as in [10]: 1, 3 and 11, orange: binding to mouse adipocytes; 2 and 12, binding to 
Monocytes and T-cell activators; 6, Binding to primary mouse macrophages; 7 and 9, binding to 
13, binding to cell surface of J774A.1 cells; 14, monocytes modulation activity. 



tested three of the sectors properties using computa- 
tional means: statistical and divergence independences 
and structural continuity. Second, we mapped sites iden- 
tified as coevolving in one bacterial group but not in 
other into those protein regions known to have shifted 
GroEL function to other folding unrelated functions in 
that bacterial group. 

Groups of coevolution form protein sectors statistically 
independent among bacteria 

Functional links between sites impose correlation in 
their entropies [43]. To test this, we measured the 
amount of conservation (Di) for the sites of each GroEL 
protein domain as a function of Entropy (see Material 
and Methods for details). Then, we calculated the correl- 
ation entropy (//) for each group of coevolving sites (see 
Material and methods). To determine if the group of 
coevolving sites within a bacterial clade is independent 
from that of another bacterial clade, we compared the 
correlation entropy of groups of different bacterial clades 



for each of the GroEL domains. Three were the domains 
compared (apical, equatorial and intermediate domains) 
between bacterial groups. If the change in the sites com- 
position of coevolution networks is the result of func- 
tional shifts between bacteria, sites within a network in a 
bacterial group (gl) should correlate in their entropies 
(//) more than with any of the sites of the network of the 
other bacterial group {g2). That is, the entropy correl- 
ation of one group should be independent of that of the 
other group (^^.^2 ^ ^i^^^)- 

A main difference between our approach and that of 
the previous study [43] is that sectors in our approach 
are defined based on coevolution analyses derived from 
CAPS, while those of Halabi and colleagues [43] were 
identified using statistical coupling analyses (SCA) to de- 
termine the contribution of correlations to conservation 
profiles. 

Analyses of correlation entropies showed that all groups 
of coevolving sites within the apical domain for a bacterial 
group were independent from those in other bacterial 
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Figure 4 Distribution of coevolving sites amongst secondary structures in GroEL. The observed number of sites witiiin eacli structure 
(colour coded bars according to the bacterial group) was compared to the expected number of such sites using a distribution. Significant 
values (P< 0.05) are indicated with black stars. 



groups (Figure 5a) (e.g., comparison of 6 = Igi-g2 - {Igi^Ig2) 
from the real group with a set of 1000 pseudorandom rep- 
licates yield no significant difference between the two 
groups (gl and g2)). The same was inferred for the groups 
of coevolving sites from the intermediate domain of 
GroEL. Conversely, in the apical domain we found inde- 
pendent groups of coevolution for all bacterial groups 
with the exception of Spirochaetes, in which Igi_g2 was 
much smaller than {Igi+Ig2) (Figure 5a). Comparison of 
the mean differences {0) indicates that equatorial domain 
showed the strongest signal of functional sectors inde- 
pendence among bacterial strains, followed by the inter- 
mediate and apical domains (Figure 5b). These differences 
were not, however, statistically significant under a 
Wilcoxon ranked test. 

Groups of coevolution present structural continuity 

To determine if the sites within a coevolution group 
were linked structurally within a bacterial clade, we plot- 
ted them into the crystal structure of E, coli GroESL 
proteins complex. Figure 6 presents evidence of the 
structural clustering of sites within each of the bacterial 
groups in the three protein domains. Importantly, the 
coevolutionary shifts between bacterial groups are appar- 
ent and their structural mapping provides insights into 
the possible functional differences among the groups of 
coevolving residues. A remarkable observation is that 
amino acids that coevolved in one group of bacteria are 
located in a completely different structure face to those 
detected in another group of bacteria, while both keep- 
ing structural continuity. As a case in point, the alpha 
helices populated with coevolving amino acids in Proteo- 
bacteria are independent from those in Actinobacteria. 
This rule applies to both, the equatorial and the apical 
domains (Figure 6a and f). In addition to the difference 



in structural patterns, Proteobacteria present coevolving 
amino acids in regions involved in protein folding while 
Actinobacteria are mostly affected in the surfaces of 
subunits mediating the inter-ring contacts. This differ- 
ential distribution supports functional shifts between 
both bacterial clades, with one having larger effect on 
folding while the other on the stability of the GroEL 
double ring complex. Another striking example of func- 
tional and structural differentiation is that of Spiro- 
chaetes, with most of the coevolving amino acids 
mapping to the inter-ring regions of the equatorial do- 
main (Figure 6d). 

Coevolution of GroEL sites with folding-independent 
functions 

GroEL regions responsible for functional differences 
among bacteria are reported in Figure 4 of [10]. We have 
compared the sites coevolving in one bacterial clade but 
not another and plotted these sites in the different do- 
mains known to confer GroEL alternative non-folding 
functions. Many of the sites involved in a coevolutionary 
relationship in a bacterial group have been reported to 
be involved in a GroEL function alternative to protein 
folding (Figure 3). For example, two of the coevolving 
sites in Actinobacteria are directly involved in monocyte 
modulation by the Actinobacterium Micobacterium tu- 
berculosis ([44], figure 3). Moreover, a number of the 
amino acids identified as coevolving exclusively in 
proteobacteria map to a region from GroEL previously 
found to bind to potato leafroll virus and to facilitate its 
movement in the plant [45,46] (Figure 3). The extensive 
list of coevolving amino acid sites mapping within these 
folding- alternative functions (Figure 3) is testament to 
the important implications of groups of coevolution in 
the functional plasticity of GroEL. 
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Figure 5 Groups of coevolving sites correlate in their entropies forming independent protein sectors. We measured entropy and correlation 
entropies for eacli pair of groups belonging to different bacteria using the equations 1 to 4 from the text. Compared groups were taken from the 
same protein domain (Apical, Intermediate or Equatorial). Bacteria groups compared included Actinobacteria (a) Bacteroidetes (b) Cyanobacteria 
(c) Spirochaetes (d) Firmicutes (e) and Proteobacteria (f).Two groups of coevolution {gl and g2) were considered independent when the joined 
correlation entropy for the groups {Is(gi,g2)) was approximately equal to the sum of correlation entropies {Is(gT)) and {Is(g2))- The significance of the 
difference between these two parameters [0 = Is(g],g2) - ik(gi) + k(g2))\ was tested against a null distribution of© drawn from a 1000 groups built by 
randomly sampling sites from the same protein domain. Significant 0 values under a normal test (P< 0.05) are indicated with *. 



Discussion 

Complex coevolutionary networks in GroESL define the 
functional boundaries of amino acid sites 

Our analyses of the coevolutionary dynamics within 
GroES and GroEL as well as between both these 
interacting proteins uncover a complex network of evo- 
lutionary dependencies among amino acid sites. These 
dependencies often involve sets of sites with known 
functional relevance but also comprise other sites with 
unknown importance. However, the functional import- 
ance of these untested sites is supported by a number of 
observations and tests made in this study. First, we show 
that most amino acids involved in coevolutionary dy- 
namics are three-dimensionally clustered in the protein 
structure and closely located to functionally or structur- 
ally important sites. As a case in point, functionally im- 
portant sites in GroES present the largest centrality 
values in GroES coevolutionary network, indicating their 
greater evolutionary dependencies with other sites 
closely located in the protein structure. The coevolution 



of sites surrounding important functional regions may 
compensate the effects of mutations at these functional 
sites or near functional and catalytic pockets, thereby 
maintaining an overall volume or shape for that pocket 
[37]. Our results on the proximity of coevolving sites to 
functional domains support previous studies claiming 
that covarying groups of amino acid sites are often iden- 
tified at critical protein regions [37,40,47-52]. Second, 
covarying amino acid sites identified in this study are 
part of networks that correspond to structural clusters, 
that is, these sites fall close to each other in the protein 
structure. In conclusion, the low number of sites identified 
in our coevolutionary analyses, their structural clustering, 
and their proximity to functional or proteins interface re- 
gions point to their functional or structural importance. 
This is supported by previous studies indicating that sites 
coevolving with few others within the protein are likely to 
represent functional dependencies [49,53,54]. 

Most covarying amino acid sites in GroEL were identi- 
fied in the equatorial and apical domains and only few 
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Figure 6 Distribution of groups of coevolving sites within the three domains of crystal structure of GroEL (1 AON.pdb). We compared these 
distributions using a dinner GroEL2-GroES2. Tine groups of bacteria represented are Actinobacteria (a) Bacteroidetes (b) Cyanobacteria (c) Spirocliaetes 
(d) Firmicutes (e) and Proteobacteria (f). Sites under coevolution are liigliliglited as solid splieres, witli tliose belonging to the same group colour- 
coded. Sites falling within the apical, intermediate and equatorial domains are coded with the colours blue, yellow and red, respectively, 
k J 



sites were located in the intermediate domain. Apical 
and equatorial domains perform most functions in 
GroEL. It is remarkable that many of the amino acids 
from the equatorial domain involved in coevolutionary 
relationships belong to the most carboxi-terminal GroEL 
tail. Indeed, the folding of substrates within the central 
GroEL cavity is favoured by the limited size and hydro- 
phobicity of the cavity [6,20]. The C-terminal tail of 
GroEL define the environment within the central cavity 
of GroEL with regards to its hydrophobicity, which 
would impact on both the size and nature of the sub- 
strate proteins folded by the chaperonin [55]. Collect- 
ively, our results uncover a list of amino acid sites that 
might have profound implications on the functions of 
GroES and GroEL. 

The evolutionary dependencies between GroES and GroEL 
provide information on the structural consequences of 
their interaction 

Our coevolutionary analyses in GroES and GroEL identi- 
fied several sets of sites with apparently distinct roles. 
First, GroES amino acid regions coevolving with residues 
from GroEL are all located in the interface between the 
GroES subunits. Second, GroEL residues coevolving with 
GroES are distributed among the three domains, apical, 
intermediate and equatorial. In the apical domain, two 



amino acid residues coevolving with GroES are involved 
in substrate binding. One site is located at the interface 
between the two GroEL heptameric rings and may be in- 
volved in the stabilization of these domains. Indeed, the 
folding reaction cycle requires the double ring of GroEL, 
in which the information passes between the rings to 
signal the ATP hydrolysis progress in one ring and 
which causes important conformational changes in the 
opposite ring [56,57]. One such change involves the 
weakening of GroES-GroEL binding, which ends with 
the binding of an ATP to the opposite ring [58]. The 
inter-ring amino acid contacts are, therefore, essential 
for the folding cycle completion and release of GroES 
from the cis ring once ATP has been bound to the oppos- 
ite ring. Arguably, coevolution between the interface of 
the rings and GroES may be the result of the constraints 
to maintain the structural communication between the 
two GroEL rings upon the interaction with GroES. 

Amino acids coevolution underlies the functional plasticity 
of GroES and GroEL in bacteria 

Our results bring forward the controversial, although in- 
tuitive, suggestion that the function of a protein may 
change across an evolutionary scale leading to a plastic 
fitness landscape in which constraints on amino acids 
can vary dramatically. Against the static view of one 
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protein one function, we propose that proteins have the 
potential to perform many alternative functions. Leaping 
from one function to another requires the correlated 
evolution of key amino acids in the protein. GroEL, and 
its co-chaperonin GroES, offer a unique system to test 
this hypothesis because, despite its essentiality to the 
cell, this protein has evolved many alternative functions 
in other bacteria [21-30]. The performance of alternative 
functions is dependent on the fixation of mutations in 
genes. Since amino acids are constrained by their inter- 
actions with other amino acids, fixation of mutations at 
sites with functional relevance must be accompanied by 
mutations in other sites of the protein through molecu- 
lar coadaptation dynamics— that is, amino acids that are 
structurally or functionally linked exercise reciprocal 
natural selection on one another [59]. 

The groups of amino acids identified in the intra- 
protein and inter-protein coevolution analyses differed 
between bacterial groups, in good agreement with the 
apparent difference in functions of GroEL in these bac- 
teria. Groups of coevolving amino acids in one domain 
of a bacterial group showed statistical and structural in- 
dependence of that in the same domain from another 
bacterial group. Many of the coevolution groups found 
in one bacterial group map to regions of groEL that are 
known to encode functions alternative to protein fold- 
ing. Other coevolving amino acids could not be directly 
mapped to domains with known alternative functions, 
though their structural proximity to these domains hints 
potential roles for these sites. Remarkably, the set of 
amino acid sites involved in an evolutionary dependency 
in one bacterial group was close in the protein structure 
to the set of amino acids detected for another bacterial 
group. In fact, in some cases, the same amino acid was 
detected as coevolving with different sets of amino acids 
in two bacterial groups, thereby acting as evolutionary 
hinges of alternative functional protein sectors. For ex- 
ample, in the intra-GroEL coevolution analysis, Met514 
was detected in Actinobacteria and Bacteroidetes, but it 
was coevolving with different amino acids in these two 
groups. The general trend was that alternative sets of 
coevolving sites identified in different bacteria were 
closely located in the structure. This supports the plaus- 
ible hypothesis that shifts in the selective constraints on 
amino acid sites of GroEL are subtle between bacteria, 
and affect the same structural regions; probably those re- 
gions undergoing conformational changes when GroEL 
interacts with GroES. 

To conclude, we provide evidence of the plasticity of 
the evolutionary relationships between the amino acid 
sites in an essential protein. We also list a set of 
coevolving sites that might be worth testing for address- 
ing important questions regarding the functional prom- 
iscuity of GroEL and its evolvability under different 



conditions. Experimental studies aimed at determining 
the importance of the amino acid sites listed in this 
study may aid the development of mechanistic models 
of protein folding in the cell and the evolution of alter- 
native functions from highly conserved ones. 

Conclusions 

Our results map genetic diversity in GroESL to its func- 
tional promiscuity. While different functional sectors in 
GroESL can be assigned to distinct functions, the over- 
lap in the amino acids sets of these sectors put forward 
the conclusion that functional leaps in proteins can be 
driven by subtle sequence compositional differences. 
Our results highlight the evolutionary plasticity of GroEL 
across the entire bacterial phylogeny. Evidence on the 
functional importance of coevolving sites illuminates the 
as yet unappreciated functional diversity of proteins. 

Methods 

Sequences, alignments and phylogenetic inference 

All GroES and GroEL (also known as cpnlO and cpn60, 
respectively) sequences where downloaded from the 
OMA browser site (http://omabrowser.org). We used ei- 
ther cpnlO or cpn60 and Rhizobium as keywords. Then 
we chose the link to the page with the highest number of 
orthologs, RHIL300891 (Q1MKX3), with 903 orthologs 
(01/04/2011) for cpnlO and RHIL300890 (CH601_RHIL3), 
with 870 orthologs (23/03/2011). We removed all 
eukaryotic and archaeal sequences prior to the analysis. 
Then, we aligned all sequences using ClustalX2 [60,61]. 
The output alignment was manually refined using 
GeneDoc 2.6 [62] and this new alignment was used to 
build a neighbor-joining tree with 1000 bootstrap repli- 
cates in ClustalX2. The trees were visualized with FigTree 
1.3.1 (http://tree.bio.ed.ac.uk/software/figtree/) and all re- 
dundant sequences (same amino acidic sequences) were 
detected and deleted but leaving a representative one. 
Then, we removed the sequences belonging to duplicated 
genes within all given species, ending with a final align- 
ment that included 519 sequences for the cpnlO and 505 
sequences for the cpn60 (see Table 1). We used CAPS [50] 
to analyse the intra-protein coevolution clustering of 
amino acids for both the cpnlO and cpn60 alignments. 
For both alignments we used a threshold a value of 0.001, 
a random sampling of 100000, and a bootstrap value of 
100. In addition to these two alignments, we prepared 
new alignments for those taxonomic groups with at least 
10 sequences for both cpnlO and cpn60 proteins (sample 
sizes in Table 1): Actinobacteria, Bacteroidetes/Chlorobi 
group, Cyanobacteria, Firmicutes, all Proteobacteria to- 
gether, and Spirochaetes. In these analyses the bootstrap 
values were adapted to the sample sizes (20, 80, 100, 20, 
10, and 9, respectively). 
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To conduct coevolution analysis between GroES and 
GroEL, we built multiple sequence alignments for both 
of the proteins, which comprised the sequences belonging 
to the same organismal source (a total of 381 sequences 
for GroES and GroEL, Table 1). We downloaded the se- 
quences for the crystallized cpnlO and cpn60 proteins of 
Escherichia coli (PDB ID: lAON, MMDB ID: 47936) from 
the NCBI site (http://www.ncbi.nlm.nih.gov/sites/struc- 
ture) to map the coevolving amino acidic sites detected 
using CAPS in the protein structure. Since the output 
amino acidic sites detected by CAPS correspond to the 
position in the input alignment, which included gaps, we 
wrote a script in C-h-h (Microsoft Visual C-h-h Standard 
Edition 6.0, available from authors upon request) to iden- 
tify the coevolving sites in the sequence of the published 
structure of the protein. The networks of coevolving 
amino acids were performed using Cytoscape 2.8.2 [63]. 
The crystal structure of GroESL complex was represented 
using the software imol (P. Rotkiewicz, http://www.pirx. 
com/iMol/index.shtml). 

Coevolution analyses 

Coevolution analyses, that is the correlated variation of 
two amino acid sites throughout the multiple sequence 
alignment, was performed using a previously published 
coevolution method [64] implemented in the program 
CAPS [50]. Other Mutual Information methods were 
used as well but their performance was significantly 
poorer, providing large sets of sites and false positive re- 
sults in agreement with a previous study [64]. Briefly, 
this method estimates how correlated is the evolutionary 
variability at two sites of the same or different protein- 
coding multiple sequence alignments. To account for 
the strength of the amino acids transitions in a site, the 
BLOSUM score of amino acid transitions of a site be- 
tween two sequences was corrected by the time since 
the divergence of the two sequences compared. Time of 
divergence was calculated using the Lis corrected syn- 
onymous nucleotide substitutions. Phylogenetic artifacts— 
phylogeny asymmetries, long-branch attractions, and 
unequal codon and base composition biases among the 
bacterial clades— were accounted for by conducting the 
same coevolution analyses in a set of neutrally evolving 
simulated alignments, which bear the same evolutionary 
features as the real sequence alignments. A pair of sites 
was considered to coevolve if the probability of their cor- 
relation coefficient was lower than 0.001 when compared 
to the null distribution of such coefficients drawn from 
the simulated sequence alignments. Moreover, to identify 
coevolving pairs of sites that may be functionally or struc- 
turally linked across the bacterial phylogeny, we con- 
ducted non-parametric bootstrap analyses of covariation 
(see next section). 



Bootstrapping the pairs of coevolving sites 

In this study, we have devised a new method to deter- 
mine the reliability of a coevolution pair of amino acid 
sites. This test is based upon the assumption that pairs 
of sites involved in important functional roles within a 
phylogenetic group should be inextricably linked be- 
tween each other with regards to their evolutionary pat- 
terns, such that the two sites of the pair should be 
evolutionarily dependent on one another through their 
reciprocal natural selection. That is, a change in one 
amino acid should be accompanied by a compensatory 
(coadaptive) change in its coevolving amino acid partner. 
Making the inverse rationale, pairs of amino acid sites 
that are consistently detected as coevolving in a phylo- 
genetic context should be functionally related. 

For each of the pairs of amino acid sites detected in 
our coevolutionary analyses, we performed a non- 
parametric bootstrapping, that is we randomly sampled 
sequences from the phylogenetic tree, performed the co- 
evolutionary analyses for those sampled sequences using 
CAPS and, then, checked whether a particular pair of 
sites detected in the real coevolutionary analyses was 
also detected in this new sampled dataset. We replicated 
this procedure a 1000 times and, then, asked how many 
times each of the pairs of sites detected as coevolving in 
the real multiple sequence alignments was detected as 
significantly supporting coevolution. Those pairs that 
were identified in more than 70% of the phylogenetic 
random samples were deemed as consistently coevolving 
amino acid sites. 



Measuring statistical independence of coevolutionary 
groups among bacteria 

To measure the statistical independence of group of 
coevolving sites from another, we first calculated the en- 
tropy of the group (Ds): 

I's = fSLsln|5^+(l-fS,Ls)ln^ (I) 

fa) 

Here fg . . s frequency of the most represented 

amino acid (a) in each of the sites under coevolution 
(/, S) within the group. This frequency is com- 

pared to the frequency of the amino acid {a) in all 
the proteins {q^^^). 

Then, we measured the correlation entropy of the 
group (7^) as: 

Is = Ds-^d|^) (2) 

ieS 
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where, d|^^ is the frequency of the amino acid (a) at site 
/ and is calculated as: 

Two groups igj and ^2) are independent of one an- 
other, if their correlation entropies follows: 

^s{g„g,rh{g,) ^h{g,) (4) 

To determine the significance of the difference be- 
tween both sides of equation 4, we built 1000 groups, each 
with the same size as the coevolution group; then, we esti- 
mated Is(g2) and Is(g2)> and compared this to Is(gi,g2)' 
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